---
title: Modeling algorithms
description: Provides a list of the supervised and unsupervised modeling algorithm DataRobot supports.
---


# Modeling algorithms {: #modeling-algorithms }


DataRobot supports a comprehensive library of pre- and post-processing (modeling) steps, which combine to make up the model [blueprint](blueprints){ target=_blank }. Which are run or available in the [model repository](repository){ target=_blank } is dependent on the dataset. The comprehensive combination of pre- and post-processing steps allows DataRobot to confidently create a Leaderboard of your best modeling options. Some examples of the modeling flexibility include logistic regression with and without PCA as a pre-processor or random forests with and without a greedy search for interaction terms. 

The implication of this is that for every model in the list below, DataRobot likely runs two-to-five times, each with a different pre-processing and/or variable selection. The following sections list the relevant algorithms:

* [Pre-processing](#pre-processing)
* [Linear or additive models](#linear-or-additive-models)
* [Tree-based models](#tree-based-models)
* [Deep learning and foundational models](#deep-learning-and-foundational-models)
* [Time series-specific models](#time-series-specific-models)
* [Unsupervised models](#unsupervised-models)
* [Other model types](#other-model-types)

## Pre-processing tasks {: #pre-processing }

#### Categorical

* Buhlman credibility estimates for high cardinality features
* Categorical embedding
* Category count
* One-hot encoding
* Ordinal encoding of categorical variables
* Univariate credibility estimates with L2
* Efficient, sparse one-hot encoding for extremely high cardinality categorical variables

#### Numerical

* Binning of numerical variables
* Constant splines 
* Missing values imputed
* Numeric data cleansing
* Partial Principal Components Analysis
* Truncated Singular Values Decomposition
* Normalizer

#### Geospatial

* Geospatial Location Converter
* Spatial Neighborhood Featurizer

#### Images

* Greyscale Downscaled Image Featurizer
* No Post Processing
* OpenCV detect largest rectangle
* OpenCV image featurizer
* Pre-trained multi-level global average pooling image featurizer

#### Text models

* Character / word n-grams
* Pretrained byte-pair encoders (best of both words for char-grams and n-grams)
* Stopword removal
* TF-IDF scaling (optional sublinear scaling and binormal separation scaling)
* Hashing vectorizers for big data
* Cosine similarity between pairs of text columns (on datasets with 2+ text columns)
* Support for all languages, including English, Japanese, Chinese, Korean, French, Spanish, Chinese, Portuguese, Arabic, Ukrainian, Klingon, Elvish, Esperanto, etc.
* Unsupervised Fasttext models
* Linear n-gram models (character/word n-grams + TF-IDF + penalized linear/logistic regression)
* SVD n-gram models (n-grams + TF-IDF + SVD)
* Naive Bayes weighted SVM
* TinyBERT / Roberta/ MiniLM embedding models 
* Text CNNs

#### Generalized Linear Models

* NA imputation (methods for missing at random and missing not at random), standardization, ridit transform
* Search for best transformations
* Efficient, sparse one-hot encoding for extremely high cardinality categorical variables

## Linear or additive models {: #linear-or-additive-models }

#### Generalized Linear Models

* Penalty: L1 (Lasso), L2 (Ridge), ElasticNet, None (Logistic Regression)
* Distributions: Binomial, Gaussian, Poisson, Tweedie, Gamma, Huber
* Special Cases: 2-stage model (Binomial + Gaussian) for zero-inflated regression

#### Support Vector Machines

* Penalty: L1 (Lasso), L2 (Ridge), ElasticNet, None
* Kernel: Linear, Nyström RFB, RBF
* liblinear and libsvm

#### Generalized Additive Models
* GAM
* GA2M

## Tree-based models {: #tree-based-models }
* Decision Tree (or CART)
* Random Forest 
* ExtraTrees (or Extremely Randomized Forests)
* Gradient Boosted Trees (or GBM— Binomial, Gaussian, Poisson, Tweedie, Gamma, Huber)
* Extreme Gradient Boosted Trees (or XGBoost— Binomial, Gaussian, Poisson)
* LightGBM 
* AdaBoost
* RuleFit

## Deep learning and foundational models {: #deep-learning-and-foundational-models }
* Keras MLPs with residual connections, adaptive learning rates and adaptive batch sizes
* Keras self-normalizing MLPs with residual connections
* Keras neural architecture search MLPs using hyperband
* DeepCTR
	* Neural Factorization Machines
	* AutoInt
	* Cross Networks
* Pretrained CNNs for images using foundational models (especially EfficientNet)
	* Manually pruned and optimized for faster inference
* Pretrained + fine-tuned CNNs for images
* Image augmentation
* Pretrained TinyBERT models for text
* Keras Text CNNs
* Fastext models for text

## Time series-specific models {: #time-series-specific-models }

* LSTMs
* DeepAR models
* AutoArima
* ETS, aka exponential smoothing
* TBATS
* Prophet

## Unsupervised models {: #unsupervised-models }

#### Anomaly detection models

* Isolation Forest
* Local Outlier Factor
* One Class SVM
* Double Median Absolute Deviation
* Mahalanobis Distance
* Anomaly Detection Blenders
* Keras Deep Autoencoder
* Keras Deep Variational Autoencoder

#### Clustering models

* Kmeans
* HDBScan

## Other model types {: #other-model-types }

* Eureqa (proprietary genetic algorithm for symbolic regression)
* K-Nearest Neighbors (three distances)
* Partial-least squares (used for blenders)
* Isotonic Regression (used for calibrating predictions from other models)

Click a [blueprint](blueprints){ target=blank } node to access full model documentation. Using [Composable ML](cml/index){ target=blank }, you can build models that best suit your needs using built-in tasks and custom Python/R code.
